Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use trie for prefix matching #851

Merged
merged 13 commits into from
Jul 5, 2019
Merged

use trie for prefix matching #851

merged 13 commits into from
Jul 5, 2019

Conversation

poonai
Copy link
Contributor

@poonai poonai commented Jun 4, 2019

Close #868

This change is Reviewable

@poonai poonai requested a review from a team June 4, 2019 11:38
publisher.go Outdated
if _, ok := updatedIDs[id]; ok {
continue
}
kvs, ok := batchedUpdates[id]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ineffectual assignment to kvs (from ineffassign)

poonai added 2 commits June 4, 2019 17:13
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
@poonai
Copy link
Contributor Author

poonai commented Jun 4, 2019

@gitlw PTAL

I updated the PR.(couldn't able to push the update on your branch)

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)


publisher.go, line 22 at r4 (raw file):

	"sync"

	"github.com/dgraph-io/badger/trie"

minor: move this to in between the pb and y imports.


publisher.go, line 88 at r4 (raw file):

		for _, e := range req.Entries {
			ids := p.indexer.Get(e.Key)
			// trie can give dulplicate ids

start comments with uppercase. also periods.

Same for the other commetns.


publisher.go, line 92 at r4 (raw file):

			// if we searching the trie with hello, we'll be getting [1,1]
			// but we have to send only one update for that we're removing
			// duplicate id

duplicate ids.


publisher.go, line 95 at r4 (raw file):

			updatedIDs := make(map[uint64]struct{})
			for _, id := range ids {
				// ignoring if the entry is already updated for the id

minor: uppercase and period at the end.


publisher.go, line 149 at r4 (raw file):

func (p *publisher) sendUpdates(reqs []*request) {
	// TODO: Prefix check before pushing into pubCh.

I see this TODO removed but there's no code changes nearby. Is this TODO addressed somewhere else?


trie/trie.go, line 4 at r4 (raw file):

type node struct {
	childrens map[byte]*node

s/childrens/children


trie/trie.go, line 15 at r4 (raw file):

}

// Trie datastructure

minor: period at the end.


trie/trie.go, line 20 at r4 (raw file):

}

// NewTrie returns trie

// NewTrie returns a new trie instance.

minor: period at the end.


trie/trie.go, line 27 at r4 (raw file):

}

// Add adds the item in the trie for the given prefix path

minor: period at the end.


trie/trie.go, line 41 at r4 (raw file):

}

// Get returns prefix matched items for the given key

minor: period at the end.


trie/trie.go, line 43 at r4 (raw file):

// Get returns prefix matched items for the given key
func (t *Trie) Get(key []byte) []uint64 {
	o := []uint64{}

what's o? Maybe a more descriptive name would help.


trie/trie.go, line 56 at r4 (raw file):

}

// Delete will delete the item if the item exist in the given index path

minor: period at the end.


trie/trie_test.go, line 18 at r4 (raw file):

	ids := trie.Get([]byte("hel"))
	if len(ids) != 1 {
		t.Errorf("expected number of ids 1 but got %d", len(ids))

use require instead.

require.Equal(t, 1, len(ids))

Same for the rest of the file.


trie/trie_test.go, line 19 at r4 (raw file):

	if len(ids) != 1 {
		t.Errorf("expected number of ids 1 but got %d", len(ids))
		return

Also, don't manually return early from a test.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a few comments.

Reviewed 2 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @ashish-goswami and @Sch00lb0y)


publisher.go, line 93 at r4 (raw file):

			// but we have to send only one update for that we're removing
			// duplicate id
			updatedIDs := make(map[uint64]struct{})

Trie should not be returning duplicate ids.


trie/trie.go, line 1 at r4 (raw file):

package trie

Add license.


trie/trie.go, line 28 at r4 (raw file):

// Add adds the item in the trie for the given prefix path
func (t *Trie) Add(prefix []byte, item uint64) {

s/item/id


trie/trie.go, line 38 at r4 (raw file):

		curr = n
	}
	curr.items = append(curr.items, item)

Might want to check if the id already exists in items.


trie/trie.go, line 53 at r4 (raw file):

		curr = n
	}
	return o

You can sort the output and remove duplicates here.


trie/trie.go, line 68 at r4 (raw file):

	//NOTE: we're just removing the item not the hanging path
	old := curr.items
	new := []uint64{}

Don't call it new to avoid shadowing the keyword.


trie/trie_test.go, line 1 at r4 (raw file):

package trie

license.

- Fixing comment

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @manishrjain, @martinmr, and @Sch00lb0y)


publisher.go, line 88 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

start comments with uppercase. also periods.

Same for the other commetns.

Done.


publisher.go, line 92 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

duplicate ids.

Done.


publisher.go, line 93 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Trie should not be returning duplicate ids.

Done.


publisher.go, line 149 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I see this TODO removed but there's no code changes nearby. Is this TODO addressed somewhere else?

Yes, we're using Trie in publishUpdates


trie/trie.go, line 1 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add license.

Done.


trie/trie.go, line 4 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

s/childrens/children

Done.


trie/trie.go, line 20 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

// NewTrie returns a new trie instance.

minor: period at the end.

Done.


trie/trie.go, line 28 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

s/item/id

Done.


trie/trie.go, line 38 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Might want to check if the id already exists in items.

Done.


trie/trie.go, line 43 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

what's o? Maybe a more descriptive name would help.

Done.


trie/trie.go, line 53 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You can sort the output and remove duplicates here.

Done.


trie/trie.go, line 68 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't call it new to avoid shadowing the keyword.

Done.


trie/trie_test.go, line 18 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

use require instead.

require.Equal(t, 1, len(ids))

Same for the rest of the file.

Done.


trie/trie_test.go, line 19 at r4 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Also, don't manually return early from a test.

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r5.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @Sch00lb0y)


publisher.go, line 92 at r5 (raw file):

				}
				k := y.SafeCopy(nil, e.Key)
				kv := &pb.KV{

KV is meant to not be modified, so we don't need to copy it. We can just create it once and append it to batched updates' KVList.


trie/trie.go, line 68 at r4 (raw file):

Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…

Done.

Same slice trick here as well.


trie/trie.go, line 54 at r5 (raw file):

			curr.children[val] = n
		}
		curr = n
child, ok := node.children[val]
if !ok { child = newNode(); node.children[val] = child}
node = child

trie/trie.go, line 74 at r5 (raw file):

		return out[i] < out[j]
	})
	res := []uint64{}

res := out[:0]

You can use a Slice trick to avoid creating another slice.

https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating


trie/trie.go, line 88 at r5 (raw file):

	curr := t.root
	for _, val := range index {
		n, ok := curr.children[val]

See above about naming of node and child.


trie/trie.go, line 94 at r5 (raw file):

		curr = n
	}
	//NOTE: We're just removing the id not the hanging path.

// We're just removing...

Remove the "NOTE:".

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, @martinmr, and @Sch00lb0y)


publisher.go, line 92 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

KV is meant to not be modified, so we don't need to copy it. We can just create it once and append it to batched updates' KVList.

Done.


trie/trie.go, line 54 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…
child, ok := node.children[val]
if !ok { child = newNode(); node.children[val] = child}
node = child

Done.


trie/trie.go, line 74 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

res := out[:0]

You can use a Slice trick to avoid creating another slice.

https://github.com/golang/go/wiki/SliceTricks#filtering-without-allocating

Done.


trie/trie.go, line 88 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

See above about naming of node and child.

Done.


trie/trie.go, line 94 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

// We're just removing...

Remove the "NOTE:".

Done.

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments.

Reviewed 2 of 2 files at r6.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Sch00lb0y)


publisher.go, line 92 at r5 (raw file):

Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…

Done.

That doesn't address it.

I meant if you see multiple subscribers needing this KV, you can create it once and just append the pointer to their KV. You don't need to recreate pb.KV for every one of the ids.


trie/trie.go, line 43 at r4 (raw file):

// Get returns prefix matched items for the given key
func (t *Trie) Get(key []byte) []uint64 {
	o := []uint64{}

Using map would be better. In fact, you can just return the map instead of returning a list. The map can be of type map[uint64]struct{}.

This would avoid having to do the sorting, then removal of duplicates, etc.


trie/trie.go, line 56 at r6 (raw file):

		node = child
	}
	node.ids = append(node.ids, id)

Add a comment that we only need to add id to the last node for this prefix.

@campoy
Copy link
Contributor

campoy commented Jun 14, 2019

Could you please link to an issue describing what this change is fixing?

Thanks!

@animesh2049
Copy link
Contributor


trie/trie.go, line 96 at r6 (raw file):

	// We're just removing the id not the hanging path.
	out := node.ids[:0]
	for i := 0; i < len(node.ids); i++ {

Using range form of loop will be better ?

trie/trie.go Outdated

// Get returns prefix matched ids for the given key.
func (t *Trie) Get(key []byte) map[uint64]struct{} {
out := make(map[uint64]struct{}, 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S1019: should use make(map[uint64]struct{}) instead (from gosimple)

- reuse the kv across all the subscribers

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, 6 unresolved discussions (waiting on @animesh2049, @manishrjain, @martinmr, and @Sch00lb0y)


trie/trie.go, line 43 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Using map would be better. In fact, you can just return the map instead of returning a list. The map can be of type map[uint64]struct{}.

This would avoid having to do the sorting, then removal of duplicates, etc.

Done.


trie/trie.go, line 56 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a comment that we only need to add id to the last node for this prefix.

Done.


trie/trie.go, line 96 at r6 (raw file):

Previously, animesh2049 (Animesh Chandra Pathak) wrote…

Using range form of loop will be better ?

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a few comments. Address those and get someone in the team to merge it.

Can't wait for you to join the team!

Reviewed 2 of 3 files at r7, 1 of 1 files at r8.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)


publisher.go, line 87 at r8 (raw file):

		for _, e := range req.Entries {
			ids := p.indexer.Get(e.Key)
			if len(ids) != 0 {

if len(ids) > 0


publisher.go, line 98 at r8 (raw file):

				for id := range ids {
					if _, ok := batchedUpdates[id]; !ok {
						batchedUpdates[id] = &pb.KVList{}

Instead of creating an empty KVList every time, you can create an empty KVList above as a const or var, and just assign these to it.


trie/trie.go, line 56 at r6 (raw file):

Previously, sch00lb0y (பாலாஜி ஜின்னா) wrote…

Done.

I don't see your comment. Why did you mark this as "Done"?

poonai added 2 commits June 20, 2019 22:22
Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
- change not equal

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 6 unresolved discussions (waiting on @animesh2049, @manishrjain, and @Sch00lb0y)


publisher.go, line 87 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

if len(ids) > 0

Done.


publisher.go, line 98 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of creating an empty KVList every time, you can create an empty KVList above as a const or var, and just assign these to it.

Sorry, I didn't understand this part :(

Could you explain this part?


trie/trie.go, line 56 at r6 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I don't see your comment. Why did you mark this as "Done"?

Sorry, some mix-up happened. Instead of Add, I added the comment to Get :(

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r9.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @animesh2049 and @Sch00lb0y)

@poonai poonai merged commit c32e701 into hypermodeinc:master Jul 5, 2019
@poonai poonai deleted the trie branch July 5, 2019 17:35
jarifibrahim pushed a commit that referenced this pull request Mar 12, 2020
* Add some TODOs for pubsub API.

* Making the Loader API public

* Adding the trie data structure for publishers, but got stuck for deleting subscribers from the trie

* use trie for prefix matching

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* delete id from trie

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* fix ineffassign

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* - trie return distinct element.
- Fixing comment

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* remove allocation for finding distict element

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* change to the convention

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* - use map instead of sorting to find distinct element
- reuse the kv across all the subscribers

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>

* - add comment
- change not equal

Signed-off-by: பாலாஜி <rbalajis25@gmail.com>
(cherry picked from commit c32e701)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

use trie to prefix match for the subscriber API
8 participants